Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

summary: add real time cost to log collector #480

Merged
merged 4 commits into from
Sep 2, 2020

Conversation

YuJuncen
Copy link
Collaborator

Signed-off-by: Hillium maruruku@stu.csust.edu.cn

What problem does this PR solve?

implement #478

What is changed and how it works?

We make serval changes to make the log of time cost correct:

  1. collect kv and size each time of importing.

    br/pkg/restore/import.go

    Lines 328 to 331 in d9d6207

    }
    summary.CollectSuccessUnit(summary.TotalKV, 1, file.TotalKvs)
    summary.CollectSuccessUnit(summary.TotalBytes, 1, file.TotalBytes)
    }
  2. a log of real time cost is added to the summary.
  3. collect the success unit when checksumming, so the total cost field would be (the time of restoring + the time of checksumming) instead of just the time of restoring.

Check List

Tests
image

  • Manual test (add detailed scripts or steps below)

Release Note

  • Add real time cost to log.

Signed-off-by: Hillium <maruruku@stu.csust.edu.cn>
@kennytm
Copy link
Collaborator

kennytm commented Aug 31, 2020

what is "TiKV service time"?

@YuJuncen
Copy link
Collaborator Author

YuJuncen commented Aug 31, 2020

what is "TiKV service time"?

The time how long the cluster handling requests of this BR task. (i.e. the summary of time waiting for split + scatter + import + checksum)

It's hard to name this time, "CPU time" isn't accurate enough since it doesn't really need CPUs to run it. Any better idea? @kennytm

@kennytm
Copy link
Collaborator

kennytm commented Aug 31, 2020

backup time / restore time ?

@YuJuncen
Copy link
Collaborator Author

YuJuncen commented Sep 1, 2020

If we did so, the summary package must be aware of what task we are doing, which may lead to unnecessary coupling. How about a more generic name like task time?

($(task_name) time would be good.)

Copy link
Member

@overvenus overvenus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

pkg/restore/client.go Outdated Show resolved Hide resolved
@kennytm
Copy link
Collaborator

kennytm commented Sep 1, 2020

/run-integration-test

@kennytm
Copy link
Collaborator

kennytm commented Sep 1, 2020

https://internal.pingcap.net/idc-jenkins/blue/organizations/jenkins/br_ghpr_unit_and_integration_test/detail/br_ghpr_unit_and_integration_test/2781/pipeline

[2020-09-01T06:59:54.678Z] [2020/09/01 14:59:54.652 +08:00] [INFO] [collector.go:187] ["Table backup Failed summary : total backup ranges: 2, total success: 0, total failed: 2"] ["backup total regions"=2] [BackupTS=419153469155573761] [unitName="range start:7480000000000000335f720000000000000000 end:7480000000000000335f72ffffffffffffffff00"] [error="rpc error: code = Unavailable desc = timed out waiting for server handshake"] [errorVerbose="rpc error: code = Unavailable desc = timed out waiting for server handshake\ngithub.com/pingcap/errors.AddStack\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20200729012136-4e113ddee29e/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20200729012136-4e113ddee29e/juju_adaptor.go:15\ngithub.com/pingcap/br/pkg/backup.(*pushDown).pushBackup\n\t/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/pkg/backup/push.go:117\ngithub.com/pingcap/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/pkg/backup/client.go:501\ngithub.com/pingcap/br/pkg/backup.(*Client).BackupRanges.func2.1\n\t/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/pkg/backup/client.go:430\ngithub.com/pingcap/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/pkg/utils/worker.go:62\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20190911185100-cd5d95a43a6e/errgroup/errgroup.go:57\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"] [unitName="range start:7480000000000000335f69800000000000000100 end:7480000000000000335f698000000000000001fb"] [error="rpc error: code = Unavailable desc = timed out waiting for server handshake"] [errorVerbose="rpc error: code = Unavailable desc = timed out waiting for server handshake\ngithub.com/pingcap/errors.AddStack\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20200729012136-4e113ddee29e/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20200729012136-4e113ddee29e/juju_adaptor.go:15\ngithub.com/pingcap/br/pkg/backup.(*pushDown).pushBackup\n\t/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/pkg/backup/push.go:117\ngithub.com/pingcap/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/pkg/backup/client.go:501\ngithub.com/pingcap/br/pkg/backup.(*Client).BackupRanges.func2.1\n\t/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/pkg/backup/client.go:430\ngithub.com/pingcap/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/br_ghpr_unit_and_integration_test/go/src/github.com/pingcap/br/pkg/utils/worker.go:62\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20190911185100-cd5d95a43a6e/errgroup/errgroup.go:57\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"]

@kennytm
Copy link
Collaborator

kennytm commented Sep 1, 2020

/run-integration-test

@kennytm
Copy link
Collaborator

kennytm commented Sep 1, 2020

https://internal.pingcap.net/idc-jenkins/blue/organizations/jenkins/br_ghpr_unit_and_integration_test/detail/br_ghpr_unit_and_integration_test/2782/pipeline

[2020-09-01T07:49:13.586Z] TiFlash started...
[2020-09-01T07:49:13.586Z] TiFlash seems doesn't started, retrying...
[2020-09-01T07:49:13.586Z] Logging trace to /tmp/backup_restore_test/tiflash-stdout.log
[2020-09-01T07:49:13.586Z] Logging errors to /tmp/backup_restore_test/tiflash-stderr.log
[2020-09-01T07:49:16.108Z] *===== Running test tests/br_tiflash/run.sh... =====*
[2020-09-01T07:49:16.108Z] ERROR 1105 (HY000) at line 1: the tiflash replica count: 1 should be less than the total tiflash server count: 0

@kennytm
Copy link
Collaborator

kennytm commented Sep 1, 2020

/run-integration-test

Copy link
Collaborator

@kennytm kennytm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Hillium <maruruku@stu.csust.edu.cn>
@ti-srebot ti-srebot removed the status/LGT1 LGTM1 label Sep 2, 2020
@ti-srebot ti-srebot added the status/LGT2 LGTM2 label Sep 2, 2020
@kennytm kennytm merged commit 131ae30 into pingcap:master Sep 2, 2020
@ti-srebot
Copy link
Contributor

cherry pick to release-3.1 in PR #485

@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #486

ti-srebot added a commit that referenced this pull request Sep 3, 2020
Signed-off-by: Hillium <maruruku@stu.csust.edu.cn>
3pointer pushed a commit that referenced this pull request Sep 8, 2020
* summary: add real time cost to log collector

Signed-off-by: Hillium <maruruku@stu.csust.edu.cn>

* summary: rename TiKV service time to $name time

Signed-off-by: Hillium <maruruku@stu.csust.edu.cn>

* restore: fix a typo

Signed-off-by: Hillium <maruruku@stu.csust.edu.cn>

Co-authored-by: Hillium <maruruku@stu.csust.edu.cn>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants